|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.doclinx.ftxml.SRC2STF_PARMS
Parameter block class that controls optional functionality during the text parsing phase. The TeraXML system can handle several data formats. XML is the primary format supported in the Java version. The SRC2STF_PARMS class provides a record structure that contains several options for controlling the parsing task.
catSetParms
,
catAddFile.
Field Summary | |
static int |
ALT_DRI
Default alternate DRI for hidden info. |
int |
dpapi_error
Error code while building index |
java.lang.String |
ht_defFile
deprecated |
int |
ht_docIdStart
deprecated |
int |
ht_documentsProcessed
# of documents processed |
boolean |
ht_inputIsListOfFiles
deprecated |
boolean |
ht_stopOnFileError
Stop if encountering parse error |
boolean |
ht_warnUnknown
Warn about unknown GIDs |
static int |
IS_ADD
CatalogItem attrs field value. |
static int |
IS_ARCH1ST
CatalogItem attrs field value. |
static int |
IS_ARCHN
CatalogItem attrs field value. |
static int |
IS_AUTOTYPE
catAddFile method filter parameter type |
static int |
IS_COMP1ST
CatalogItem attrs field value. |
static int |
IS_COMPN
CatalogItem attrs field value. |
static int |
IS_DELETED
CatalogItem attrs field value. |
static int |
IS_ENTITY_EXTRACTED
CatalogItem attrs field value. |
static int |
IS_FILTER
catAddFile method filter parameter mask |
static int |
IS_FILTER1
Alternate name for IS_HTML |
static int |
IS_FILTER2
Alternate name for IS_XML |
static int |
IS_FILTER3
Alternate name for IS_GENERIC |
static int |
IS_FILTER4
Alternate name for IS_TEXT |
static int |
IS_FILTER5
Alternate name for IS_AUTOTYPE |
static int |
IS_FILTERED
CatalogItem attrs field value. |
static int |
IS_GENERIC
catAddFile method filter parameter type: C++ only |
static int |
IS_HTML
catAddFile method filter parameter type |
static int |
IS_PRIMARY
CatalogItem attrs field value. |
static int |
IS_TEXT
catAddFile method filter parameter type |
static int |
IS_UPDATE
CatalogItem attrs field value. |
static int |
IS_XML
catAddFile method filter parameter type |
java.lang.String |
sr_addedText
Other added (non-indexed) text |
java.lang.String |
sr_altTitle
Alternate title |
static int |
SR_APPEND_DATES
sr_flags bit setting. -- Append normalized dates to added text field |
boolean |
sr_appendToOutput
Append to output (else overwrite) |
com.doclinx.jftr.CharProp |
sr_charProp
Internal use only (not user parameter). |
byte |
sr_contextAaidx
Context aaidx value(context attr) |
java.lang.String |
sr_dateFormats
Allow users to specify date formats for parsing -- Uses Java SimpleDateFormat format strings (in quotes) delimited by ';' Note the 4 defaults are: "MM/dd/yyyy";"MMMM dd,yyyy"; "yyyyMMdd";"MM/dd/yyyy HH:mm:ss"; format: "fmt1";"fmt2";"fmt3" |
static int |
SR_DDOC
sr_flags bit setting. -- Disable Built-in Doc Filter |
boolean |
sr_debug
Turn on parser debug info output |
static int |
SR_DOC
File type value. |
static int |
SR_DOCCONTEXT
sr_flags bit setting. |
static int |
SR_DPDF
sr_flags bit setting. -- Disable Built-in PDF Filter |
boolean |
sr_enableJapanese
deprecated. |
java.lang.String |
sr_encoding
Text encoding to use (no detect) |
static int |
SR_EXCLUDE_XMLATTR
sr_flags bit setting. |
java.lang.String |
sr_excludeList
File exclude list: Exclude format: *.xml;foo. |
com.doclinx.ftxml.AppParms |
sr_f1
Parameter callback information -- Additional application data for a document. |
com.doclinx.ftxml.InputCallback |
sr_f2
Input callback function -- Open InputStream for readiing. |
int |
sr_filter
Internal use only (not user parameter). |
int |
sr_flags
SR_FLAGS (filter options), see flag values |
int |
sr_foldSettings
Control bits for case folding |
static int |
SR_GENERIC
File type value. |
static int |
SR_GENERICCONTEXT
sr_flags bit setting. |
static int |
SR_GENERICIDS
sr_flags bit setting. |
java.lang.String |
sr_genericRoot
Optional generic filter ROOT tag |
static int |
SR_GENROOT_TYPE
sr_flags bit setting. |
java.lang.String |
sr_globalParms
Global user parameter data (use XML style tag). |
java.lang.String |
sr_gpConfig
Internal use only (not user parameter). |
java.lang.String |
sr_gpDll
Internal use only (not user parameter). |
com.doclinx.ftxml.GFilter |
sr_gpFilter
Internal use only (not user parameter). |
static int |
SR_HASCONTEXT
sr_flags bit setting mask. |
static int |
SR_HTML
Return code indicating type of file found. |
static int |
SR_HTML_HTM_CONTEXT
sr_flags bit setting. -- html <html>,<title>, and <meta> context |
static int |
SR_HTMLCONTEXT
sr_flags bit setting. |
static int |
SR_INCLUDE_CDDATA
sr_includeWords parameter field values.
|
static int |
SR_INCLUDE_COLL_HDR
sr_includeWords bit setting. |
static int |
SR_INCLUDE_HTMLATTR
sr_flags bit setting. |
static int |
SR_INCLUDEDOCTYPE
Permitted values for the sr_flag field. |
java.lang.String |
sr_includeList
File include list: Include format: *.xml;foo. |
boolean |
sr_includePunctuation
Place punct tokens in STF |
int |
sr_includeWords
Control bits for word inclusion |
boolean |
sr_indexAltTitle
true if indexing alt title |
boolean |
sr_indexModTime
Index file modified time |
boolean |
sr_indexURL
Index URL text (set in map file) |
java.lang.String |
sr_JDBCDoc
Default JDBC Document wrapper |
static int |
SR_KEEPXMLFROMPDF
sr_flags bit setting. |
com.doclinx.jftr.Log |
sr_logFile
Conversion information log file. |
java.lang.Object |
sr_map8
Mapper for 8-bit encodings |
java.lang.String |
sr_mapDirectory
Map directory for map files(.txt) |
int |
sr_maxWordChars
Maximum length of a word(255 max) |
com.doclinx.ftxml.FileTime |
sr_modTime
Internal use only (not user parameter). |
static int |
SR_NOSPANSCRIPT
sr_flags bit setting. |
java.lang.String |
sr_outputFile
Internal use only (not user parameter). |
static int |
SR_PARMCONTEXT
sr_flags bit setting. |
static int |
SR_PDF
File type value. |
static int |
SR_PDF_CONTENTORDER
sr_flags bit setting. -- Enable raw order interpretation of PDF |
static int |
SR_PDF_HILITE
sr_flags bit setting. -- Enable built-in PDF filter to collect hilite info. |
static int |
SR_PDF_PHYSORDER
sr_flags bit setting. -- Enable physical order interpretation of PDF |
static int |
SR_PDFCONTEXT
sr_flags bit setting. |
java.lang.String |
sr_processFile
Where to put catAddFile process file. |
static int |
SR_PROMOTE_ALTTITLE
sr_flags bit setting. -- Promote alternate title when title empty. |
java.lang.String |
sr_regExpression
Word break regular exp (C++ only) |
static int |
SR_REMOVE_FILEEXT
sr_flags bit setting. -- Remove path from file names in catalog |
static int |
SR_REMOVE_FILEPATH
sr_flags bit setting. -- Remove path from file names in catalog |
static int |
SR_SET_WORDBRK_EXT
sr_flags bit setting. -- Set word break by file extent. |
java.lang.String |
sr_stfFile
Where to put output token file |
static int |
SR_TEXT
File type value. |
java.lang.String |
sr_URL
URL text |
static int |
SR_USE_ALTDRI
sr_flags bit setting. -- Place any added text into ALT_DRI |
static int |
SR_USEDLLCALLBACK
sr_flags bit setting. |
java.lang.String |
sr_vsdf
Internal use only (not user parameter). |
static int |
SR_XML
File type value. |
static int |
SR_XMLCONTEXT
sr_flags bit setting. |
static int |
SR_XMLSTRICT
sr_flags bit setting. |
Constructor Summary | |
SRC2STF_PARMS()
Constructor with default values for parse control parameters. |
Method Summary | |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final int SR_INCLUDEDOCTYPE
SR_INCLUDEDOCTYPE - Include document type integer in DRI 2 of the parsed output (STF). For all parser types. SR_NOSPANSCRIPT - Include words found in HTML javascript block. SR_XMLSTRICT - Return parser error if an XML file does not begin with . SR_GENROOT_TYPE - Generate a "Root" name from the numeric type of a document. This each file parameter set into group organized by document type. NOTE: this option is currently only used in the C++ version. SR_HTMLCONTEXT - Build context tree if HTML (not recommended when HTML is not "well-formed"). SR_XMLCONTEXT - Build context tree for XML files. SR_GENRICCONTEXT - Build context tree when using generic filter (all other supported file types). SR_PARMCONTEXT - Enable context for application passed parameters. SR_GENERICIDS - Use sr_contextAaidx value as attribute for values generated from generic tag values. Do not combine with use of context trees. Note: this options is currently only used in the C++ version. SR_USEDLLCALLBACK - ** C++ version ONLY! SR_INCLUDE_HTMLATTR - Include HTML tag attributes (default is OFF) SR_EXCLUDE_XMLATTR - Exclude XML attributes (default in ON) SR_REMOVE_FILEPATH - Remove path prefix from file name stored in catalog. SR_REMOVE_FILEEXT - Remove file extent from file name stored in catalog. SR_DPDF - Disable Built-in PDF Filter SR_PDF_HILITE - Enable built-in PDF filter to collect hilite info. SR_PDF_CONTENTORDER - Read PDF in content (raw) order - default: reading order SR_PDF_PHYSORDER - Read PDF in physical order - default: reading order SR_DDOC - Disable Built-in DOC Filter SR_HTML_HTM_CONTEXT - Limit HTML context to <html>,<title>, and <meta> SR_SET_WORDBRK_EXT - Get word break map file based upon file extent. SR_PROMOTE_ALTTITLE - Use alternate title for title (if empty)
public static final int SR_NOSPANSCRIPT
public static final int SR_XMLSTRICT
public static final int SR_GENROOT_TYPE
public static final int SR_HTMLCONTEXT
public static final int SR_XMLCONTEXT
public static final int SR_GENERICCONTEXT
public static final int SR_PARMCONTEXT
public static final int SR_PDFCONTEXT
public static final int SR_GENERICIDS
public static final int SR_USEDLLCALLBACK
public static final int SR_INCLUDE_HTMLATTR
public static final int SR_EXCLUDE_XMLATTR
public static final int SR_USE_ALTDRI
public static final int SR_APPEND_DATES
public static final int SR_REMOVE_FILEPATH
public static final int SR_REMOVE_FILEEXT
public static final int SR_DPDF
public static final int SR_PDF_HILITE
public static final int SR_PDF_CONTENTORDER
public static final int SR_PDF_PHYSORDER
public static final int SR_DDOC
public static final int SR_DOCCONTEXT
public static final int SR_KEEPXMLFROMPDF
public static final int SR_HTML_HTM_CONTEXT
public static final int SR_SET_WORDBRK_EXT
public static final int SR_PROMOTE_ALTTITLE
public static final int SR_HASCONTEXT
public static final int SR_HTML
public static final int SR_XML
public static final int SR_TEXT
public static final int SR_GENERIC
public static final int SR_PDF
public static final int SR_DOC
public static int ALT_DRI
public static final int IS_FILTER1
public static final int IS_HTML
public static final int IS_FILTER2
public static final int IS_XML
public static final int IS_FILTER3
public static final int IS_GENERIC
public static final int IS_FILTER4
public static final int IS_TEXT
public static final int IS_FILTER5
public static final int IS_AUTOTYPE
public static final int IS_FILTER
public static final int IS_DELETED
attrs
field value.
public static final int IS_ADD
attrs
field value.
public static final int IS_PRIMARY
attrs
field value.
public static final int IS_UPDATE
attrs
field value.
public static final int IS_FILTERED
attrs
field value.
public static final int IS_ARCH1ST
attrs
field value.
public static final int IS_ARCHN
attrs
field value.
public static final int IS_COMP1ST
attrs
field value.
public static final int IS_COMPN
attrs
field value.
public static final int IS_ENTITY_EXTRACTED
attrs
field value.
public static final int SR_INCLUDE_CDDATA
sr_includeWords
parameter field values.
SR_INCLUDE_CDDATA - If bit set, include CDDATA words in parse. SR_INCLUDE_COLL_HDR - If bit set, and processing XML composite file, include words from encapsulating header tag(s) for each composite file.
public static final int SR_INCLUDE_COLL_HDR
public java.lang.String sr_stfFile
public java.lang.String sr_processFile
public com.doclinx.jftr.Log sr_logFile
public int sr_flags
flag values
public byte sr_contextAaidx
public java.lang.String sr_genericRoot
public boolean sr_appendToOutput
public boolean sr_includePunctuation
public boolean sr_debug
public boolean sr_enableJapanese
public int sr_maxWordChars
public java.lang.String sr_regExpression
public int sr_foldSettings
public int sr_includeWords
public java.lang.String sr_encoding
public java.lang.String sr_altTitle
public boolean sr_indexAltTitle
public java.lang.String sr_addedText
public java.lang.String sr_JDBCDoc
public java.lang.String sr_URL
public boolean sr_indexURL
public boolean sr_indexModTime
public java.lang.String sr_includeList
public java.lang.String sr_excludeList
public java.lang.Object sr_map8
public java.lang.String sr_mapDirectory
public com.doclinx.ftxml.AppParms sr_f1
AppParms
class for more details.
public com.doclinx.ftxml.InputCallback sr_f2
InputCallback
class for more details.
public java.lang.String sr_globalParms
format: <GTAG p1='w w w' p2='wx w w'> For above example, the search path would be "wx in xpath /GTAG/@p2". This data is included and repeated for EVERY document processed usingcatAddFile()
method. User data can be set on a document-by-document basis using the parameter callback function. SeeAppParms
class for more details on user data.
public java.lang.String sr_dateFormats
Note the 4 defaults are: "MM/dd/yyyy";"MMMM dd,yyyy"; "yyyyMMdd";"MM/dd/yyyy HH:mm:ss"; format: "fmt1";"fmt2";"fmt3"
public java.lang.String sr_outputFile
public int sr_filter
public java.lang.String sr_gpDll
public com.doclinx.ftxml.GFilter sr_gpFilter
public java.lang.String sr_gpConfig
public java.lang.String sr_vsdf
public com.doclinx.ftxml.FileTime sr_modTime
public com.doclinx.jftr.CharProp sr_charProp
public boolean ht_stopOnFileError
public boolean ht_warnUnknown
public boolean ht_inputIsListOfFiles
public int ht_docIdStart
public java.lang.String ht_defFile
public int ht_documentsProcessed
public int dpapi_error
Constructor Detail |
public SRC2STF_PARMS()
Method Detail |
public java.lang.String toString()
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |